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Abstract 

Linearizability is the gold standard among algorithm designers for deducing the correctness 
of a distributed algorithm using implemented shared objects from the correctness of the cor- 
responding algorithm using atomic versions of the same objects. We show that linearizability 
does not suffice for this purpose when processes can exploit randomization, and we discuss the 
existence of alternative correctness conditions. This paper makes the following contributions: 

• Various examples demonstrate that using well-known linearizable implementations of ob- 
jects (e.g., snapshots) in place of atomic objects can change the probability distribution 
of the outcomes that the adversary is able to generate. In some cases, an oblivious adver- 
sary can create a probability distribution of outcomes for an algorithm with implemented, 
linearizable objects, that not even a strong adversary can generate for the same algorithm 
with atomic objects. 

• A new correctness condition for shared object implementations, called strong linearizability, 
is defined. We prove that a strong adversary (i.e., one that sees the outcome of each coin 
flip immediately) gains no additional power when atomic objects are replaced by strongly 
linearizable implementations. In general, no strictly weaker correctness condition suffices 
to ensure this. We also show that strong linearizability is a local and composable property. 

• In contrast to the situation for the strong adversary, for a natural weaker adversary (one 
that cannot see a process' coin flip until its next operation on a shared object) we prove 
that there is no correspondingly general correctness condition. Specifically, any lineariz- 
able implementation of counters from atomic registers and load-linked/store-conditional 
objects, that satisfies a natural locality property, necessarily gives the weak adversary more 
power than it has with atomic counters. 



* Research conducted mostly during a postdoctoral fellowship at the University of Calgary. 

^Research partially supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada. 



1 Introduction 



Linearizability is the gold standard among algorithm designers for deducing the correctness of a 
distributed algorithm using implemented shared objects from the correctness of the corresponding 
algorithm using atomicQ versions of the same objects. We explore this in more detail, showing that 
linearizability does not suffice for this purpose when processes can exploit randomization. 

In an asynchronous distributed system, processes collaborate by executing an algorithm that 
applies operations to a collection of shared objects. If the operations on these objects are atomic, 
then the result of the execution is the same as some sequential execution that could arise from an 
arbitrary interleaving of the processes' steps. Alternatively, some objects could be replaced by a 
set of software methods for the different operations on those objects. Processes would then invoke 
the appropriate method in order to simulate the intended atomic operation. In this case, there is 
a finer granularity to the interleaving of process steps. Consequently, we need to be sure that each 
possible result (e.g., the algorithm's return value for each process) that can arise from using the 
software methods could also have arisen if the operations were atomic. 

This requirement is ensured if the methods provided for each object constitute a linearizable 
implementation |HW90] of the object. Linearizability is an especially useful and important cor- 
rectness condition because it is a local property. That is, if each object in a collection of objects 
is replaced by its linearizable implementation, then the result of any execution that can arise from 
the concurrent use of the whole collection is one that could have also happened if the objects were 
atomic. 

Linearizable implementations, however, do not preserve the probability distribution of the pos- 
sible results as we transform the atomic system to the implemented one. An adversary, which 
schedules process steps, can "stretch out" a method call that was originally an atomic operation, 
and concurrently inspect the outcome of other processes' coin flips. Based on the outcomes, the 
scheduler can choose between alternative executions of the ongoing method call. As we will illus- 
trate through examples, the consequences of this additional flexibility can be powerful and subtle, 
allowing the behaviour of the implemented system to differ dramatically from that of the atomic 
system. In particular, the adversary can manipulate executions so that low-probability worst-case 
results in the atomic system become much more probable in the implemented system. 

We will see that our ability to curtail an adversary's additional power, which it can gain when 
atomic objects are replaced by linearizable implementations, depends in part upon the original 
power of the adversary. Various adversaries have been defined in literature, differing in their ability 
to base scheduling decisions on the random choices made by the algorithm (see |Asp03| for an 
overview of adversary models). The main results in this paper concern two adversary models. 
Informally, when a process is scheduled by a strong adversary, the process executes only its next 
atomic operation, whether on a local or a shared object. (Coins are local objects.) When a 
process is scheduled by a weak adversary it executes up to and including its next step on a shared 
object. Thus, a strong adversary can intervene between a coin flip and the next step by the same 
process, whereas a weak adversary cannot. Further discussion of these adversaries, including formal 
definitions, appears in Section [3l 

^ In this paper, an atomic operation is one that happens instantaneously, i.e., it is indivisible. But in the literature, 
the notion of atomicity is not used consistently. E.g., in her textbook [Lyn96| , Lynch defines atomic objects to be 
linearizable, but Anderson and Gouda [AG88) define atomicity in terms of instantaneous operations. 
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Summciry of contributions 

1. Several examples demonstrate that using linearizable implemented objects in place of atomic 

objects in randomized algorithms allows the adversary to change the probability distribution of 
results. Therefore, in order to safely use implemented objects in randomized algorithms, it does 
not suffice to simply claim that these implementations are linearizable. 

2. A new correctness condition for shared object implementations, called strong linearizability, 
which is strictly stronger than linearizability, is defined. We prove that a strong adversary against 
a randomized algorithm using strongly linearizable objects has exactly the same power as a strong 
adversary against the same algorithm using atomic objects. Conversely, if the set of histories 
that arise from a strong adversary scheduling an algorithm with implemented linearizable objects 
is "equivalent" to the set of histories that can arise from some strong adversary scheduling the 
same algorithm with atomic objects, then the former set of histories must be strongly linearizable. 
We also show that several known universal constructions of linearizable objects with common 
progress properties (e.g., wait-freedom) provide strong linearizability. Finally, we prove that strong 
linearizability, like linearizability, is both a local and a composable property. 

3. In contrast to the situation for strong adversaries, for weak adversaries strong linearizability 
has no counterpart. For example, for some randomized algorithms, weak adversaries always gain 
additional power when strong counters (that support fetch&inc and fetch&dec operations) are 
replaced with "natural" linearizable implementations based on a set of base objects supporting 
reads, writes and load-linked/store-conditional operations. Consequently, to prevent weak adversaries 
from gaining additional power, the implementation of the counter would require additional base 
object types beyond what is necessary for linearizability. This result is obtained by a technically 
involved proof; it holds even for randomized implementations with fairly weak progress conditions 
(e.g., lock- freedom) . 

Randomization has become an important technique in the design of distributed algorithms; it 
allows us to circumvent some substantial impossibilities and complexity lower bounds of determinis- 
tic algorithms. Our results impact the design of randomized algorithms that use shared objects not 
directly supported through atomic primitives in hardware. First, simulating the required shared 
objects in software using "only" linearizable implementations can break the algorithm. Second, 
such algorithms are much easier to fix (using strong linearizability) if they are designed from the 
outset to work against strong adversaries, but not so if they are designed only to work against weak 
adversaries. Third, since there are strongly linearizable universal constructions using consensus 
objects, which can be implemented using compare&swap, any system that provides compare&swap 
in hardware can implement any object in a strongly linearizable way. 

2 Examples 

We begin with two examples to provide intuition and motivation, and delay the model details, 
which are needed for our technical results, until the next section. The examples illustrate how an 
adversary in a randomized algorithm gains additional power when atomic objects are replaced with 
implemented ones. 

Atomic versus linearizable snapshots. An n process snapshot object is a vector [x-i^ • • • ? -^n) 
of length n that supports the atomic operations UPDATEp and SCANp by any process p G {1, . . . ,n}. 
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Operation UPDATEp(w) writes v to Xp while leaving all Xi,i ^ p unchanged; and SCANp(?;) returns 
the vector of values (xi, . . . , Xn) to p. 

Initialize a snapshot object for three processes to {xp,Xq,Xr) = (0,0,0). Suppose the processes 
p, q and r are executing the following code, and the adversary is trying to minimize the sum of the 
values returned in p's scan. 



p: SCANpO 

r: UPDATEr(2); UPDATEr(O) 

g: UPDATEg(6); c :=uniform-random{ — 1, 1}; UPDATEq(8 • c) 



To keep the sum in p's SCAN low, the adversary can schedule either both or neither of r's update 
operations before p's SCAN. If the adversary is weak, the same holds for g's update operations. 
Thus, under the best strategy for a weak adversary, the expected value of the sum in p's SCAN is 0. 
If the adversary is strong, its best strategy is to schedule p's SCAN before q^s second update if q's 
coin flip returns 1 and after if it returns —1. Thus, under the best strategy for a strong adversary, 
the expected value of the sum in p's SCAN is (6 — 8)/2 = — 1. 

Now suppose instead that update and SCAN are implemented from atomic registers by the 
well-known wait-free linearizable algorithm due to Afek, Attiya, Dolev, Gafni, Merritt and Shavit 



A AD"*" 93 . In this algorithm, the snapshot object is implemented as an array ^[1 : n] of registers. 
Let a collect denote a series of n atomic reads, one for each element of A, in some fixed order. To 
perform a SCAN, each process p repeatedly collects until either two successive collects are identical 
(a successful double collect), or p observes that another process, say r, has executed at least two 
update operations to A[r] during p's SCAN. In the second case, p returns the last SCAN written 
(as we explain shortly) by r during an update (a borrowed scan). To perform an update, each 
process r must first perform a SCAN and then write the result of the SCAN together with its update 
argument into A[r]. This ensures that if a SCAN has enough failed double collects, then a borrowed 
SCAN is possible. With this implementation, the adversary can maneuver p, q and r as shown in 
Figure [TJ 
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Figure 1: A "bad" scheduling using an implemented linearizable snapshot. 

In this execution, r applies a SCAN that returns a view S with sum 2 as the first part of its second 
update. Then, the adversary chooses where to schedule the remainder of r's second update, which 
is the write to A}(r\ of (5,0). If g's coin flip is —1, it schedules this write after p's third collect. In 
this case, p will have a successful double collect, which returns a view with sum 2 + (—8) = —6. 
If g's coin flip is 1, the adversary schedules r's write between p's second and third collects. In 
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Function Read() 

1 i:=-l 

2 repeat i := i + 1 until A[i].read() = 1 

3 val := i 

4 for i = val — 1, . . . , do 

5 I if 74[i].read() = 1 then val := i 

6 end 

7 return val 



Figure 2: Linearizable implementation of multivalued SRSW registers from atomic bits. 

this case, p will have a failed double collect but will have seen r update twice. Accordingly, p 
borrows r's SCAN, and so p's SCAN also returns the view S with sum 2. Thus, the adversary can 
force an expected sum in p's SCAN of only (—6 + 2)/2 = —2. Notice, furthermore, that only a weak 
adversary was used to achieve this execution in the system with an implemented snapshot object. 

Atomic versus linearizable registers. Since the implemented method calls give the adversary 
more power than it has when operations are atomic, we might conjecture that this additional power 
could be curtailed by appropriately restricting the adversary. The next example shows that this is 
not always possible. 

Let R denote a multi- valued atomic single-reader/single-writer (SRSW) register initialized to 1. 
Let processes w and p execute the following code: 

i?.WRlTE(2); c :=uniform-random{0, 2}; i?.WRlTE(c) 
p: ii.READ() 

Suppose that a strong adversary is trying to minimize the value that p reads. Then the ad- 
versary's best strategy is to have p execute its Read either before or after both of u;'s Write 
operations. In either case, the expected value of p's Read is 1. 

Now suppose, instead, that R is implemented using Vidyasankar's linearizable implementation 
of single-reader/single- writer (SRSW) multivalued registers from SRSW atomic bits [Vid88]. In 
this construction, an array A[0 . . .P\ of SRSW binary registers is used to represent a register with 
domain {0,...,^}. Value v is represented by A[v] = 1 and A[0] = ••• = A[v — 1] =0. The 
implementation is shown in Figure [2j 

Under this implementation, if the register is initialized with the value 1, the adversary's best 
strategy is to schedule as follows: First p reads "up" seeing A\fd\ = and then A\\] = 1. Next, 
w takes all of its steps, then finally p takes its remaining steps where it reads "down". With 
probability 1/2, w executed j4[0].write(l) and p will return 0; with probability 1/2, w executed 
A[2].write(l) and p will return 1. Hence the expected value returned by p's Read is 1/2. 

In this example, the adversary makes all its scheduling decisions in advance; it does not exploit 
knowledge of the outcome of coin flips while the computation proceeds. Even reducing the power 
of the adversary from strong to this weakest oblivious one does not curtail its power sufficiently to 
retain the expected behaviour of the algorithm when R is an atomic register. 



Function WRiTE(f) 

1 A[v].write(l) 

2 for i = u — 1, . . . , do 

3 I A[i].write(0) 

4 end 
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These examples motivate our central question: What is required to preserve the behaviour of 
a randomized algorithm when atomic operations are replaced by method calls? The rest of this 
paper addresses this question. 

3 Model and Definitions 

We consider a distributed shared memory system consisting of a set V ofn processes communicating 
via a set of globally shared base objects. 

A shared object is an instance of a type, which supports some set of operations. Each such 
operation op consists of an invocation including operation arguments, denoted inv{op), and a 
matching response including the return value, denoted rsp{op). A type is defined by a sequential 
specification, which determines the set of sequences of operations that can occur on any object of 
that type |HW90j . A sequence is valid for object O if it is in the sequential specification of the 
type of O. 

In this paper, we restrict ourselves to deterministic types (except for coin objects as described 
below). I.e., if opi, . . . , opj^ and opi, . . . , opj^_i, op'f^ are valid sequences and inv{op^) = inv{op'jJ, 
then rsp{opj^) = rsp{op'^). 

A process is a sequential thread of control that invokes operations on shared base objects 
and receives the responses of such operations. Processes also have access to independent random 
experiments. Let $7 be an arbitrary countable set, called the coin flip domain. A process step can 
invoke a flip operation (with no arguments) on a coin object, which returns a coin flip in as the 
matching response. 

An implementation of a target type T is a distributed method using other implemented or base 
objects. It takes as input the description of an operation invocation, and outputs a response, such 
that if multiple processes call the method multiple times sequentially, then the resulting sequence of 
method invocations and responses matches the sequential specification of T. An implementation is 
deterministic, if it uses no coin objects; in this paper we consider only deterministic implementations 
of types. An implemented object is a method that implements a type. 

Each individual process p executes its program by executing a sequence of operations on shared 
objects, where the first operation is fixed and the A;-th operation invocation. A; > 1, is a function of 
the responses p received from the preceding k — 1 operations (including flip operations). 

Steps of multiple processes interleave, resulting in a history H, which is a sequence of steps, 
i.e., invocations and responses corresponding to the operations executed by all processes on all base 
objects and all implemented objects. 

Thus, the projection of H onto the steps of any process, p, denoted H\p, is a sequence of steps 
consistent with p's program. 

We say that an operation op is atomic in history H, if op's invocation is either the last step in 
H, or else is followed immediately in by a matching response. (Note that in related literature, 
an atomic operation is typically represented by a single event. However, for technical reasons that 
become more clear in Section [3l the invocation/response representation is more convenient in this 
paper.) Operations on implemented objects are never atomic, while operations on base objects may 
or may not be atomic. (We assume that an operation on an implemented object internally applies 
at least one base object operation.) A history H is sequential if all operations in H are atomic. 

A history, H, defines a partial happens before order -<h on its operations, where, for operations 
op and op' , op -<h op' if and only if in H the response of op occurs before the invocation of op' . 
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(The relation -<h is a total order if and only if H is sequential.) 

A sequential history, H, is valid if, for any object O, the projection of H onto the steps associated 
with O, denoted H\ O, is in the sequential specification of the type of O. The new history formed 
from concatenating history H to the end of history G is denoted G o H. 

A history that arises from an algorithm that uses an implemented object O can be interpreted 
as a history T{H) of the same algorithm using a base object of the same type: T(H) is obtained 
from H by omitting, for each operation op on O, say by process p, all the steps that appear in H\p 
after the invocation inv{op) and before the matching response rsp{op). Thus, for each operation 
op on O in T{H), inv{op) corresponds to the method invocation that simulates operation op, 
rsp{op) corresponds to the response of that method call, and all operations on the base objects 
within the method call are omitted. The set of histories of an implementation is the set of histories 
where processes access an object instantiated using the implementation (and no other implemented 
object). If ^ is a set of histories, then r(^) = {T{H) \ H £ V.} denotes the set of interpretations of 
histories in Ti. 

For correctness, an interpreted history should "correspond" to one that could arise from an 
atomic object. This is captured by the correctness property called linearizability [HW90]. (Note 
that in literature sometimes the term atomic object is used to denote a linearizable object, see 
e.g. |Lyn96| .) An operation, op, is complete in a history H if H contains both inv{op) and a 
matching rsp{op). Since a process is a sequential thread of control, we see that every operation in 
H\p, except possibly the last one, is complete. A linearization of a history H \s a valid sequential 
history H' that contains all completed operations of H and possibly some non-completed ones (with 
matching responses added), and where -<}{' extends -<h- A history H is linearizable if it has at 
least one linearization. (Note that a history containing operations on implemented objects is not 
linearizable in general because it encodes operations on base objects, but its interpretation might 
be linearizable.) 

An implementation of a shared object type is linearizable if its set of histories contains only 
histories whose interpretations are linearizable. 

Flip operations on a coin object are always atomic, and return a value from the set defined 
earlier. A vector c = (ci,C2,...) G is called a coin flip vector. History H observes the coin 
flip vector c = (ci, C2, . . . ), if the i-ih flip operation in H returns value Cj. For a history H that 
contains k flip operations, let H[k] denote the prefix of H that ends with the k-th. invocation of a 
flip operation; if fewer than k flips occur during H, then H[k] denotes H. 

The order in which steps of processes interleave is given by a schedule, which is a (possibly 
infinite) sequence of process IDs. History H observes schedule a = (cti,ct2, . . . ), '\i va. H the i-th 
step is one executed by process Ui. 

Schedules are generated by an adversary. Typically, adversaries take the past execution into 
account to schedule the next process. We are concerned primarily with two adversaries. Informally, 
a weak adversary cannot intervene between a flip operation and the next operation invocation by 
the same process. This means that in any history, any flip operation by a process p is immediately 
followed by an invocation step by p. In contrast, a strong adversary can use the response of the 
coin flip to determine which process takes the next step. The following definitions serve to unify 
these adversaries, and can easily be seen to capture these informal notions. An adversary is a 
mapping A : Vt°° — )• V°° . An algorithm M together with an adversary A and a coin flip vector 
c = (ci,C2,...) G generates the unique history, denoted Hj^^j^^^^, that observes the schedule 
A{c) and the coin flip vector c, and where all processes perform steps as dictated by M.. 
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• An adversary without additional restrictions is called an offline adversary. (An offline ad- 
versary can "see" all the coin flips in advance and can use them to make current scheduling 
decisions.) 

• Adversary A is strong for algorithm M if, for any two coin flip vectors c and d that have a 
common prefix of length k, Hj^^j\^^^[k + 1] = Hj^ _Ad[^ + strong adversary cannot use 
future coin flips to make current scheduling decisions.) 

• Adversary A is weak for algorithm Ai if it is strong for algorithm Ad and is additionally con- 
strained so that, in Hj^ j\^^^, every flip by process p is followed immediately by the invocation 
of some operation by p. (A weak adversary cannot use future coin flips or the current coin 
flip to make the next scheduling decision.) 

• Adversary A is oblivious if ^ is a constant function, that is, A{c) is the same for all c € r2°°. 
(An oblivious adversary cannot use coin flips at all to make scheduling decisions.) 

A strong adversary is commonly considered in the distributed algorithm literature. Our weak 
adversary is similar to other adversaries in the literature, such as that assumed by Chor, Israeli 
and Li [CIL87] . and further discussed by Abrahamson [Abr88j. However, while their adversary 
cannot intervene between flip operations and writes, it can intervene between flips and reads. 
(No other atomic operations are considered.) Our goal is to compare the behaviour of systems 
with atomic objects to those with implemented objects, for arbitrary objects that could support 
stronger operations than just reads and writes. Consequently, we assume that an adversary treats 
all operations consistently; it cannot intervene between a flip and some operations but not others. 
Furthermore, always binding a flip operation to the next step of the same process, instead of binding 
only if that next step is a write, serves to strengthen our impossibility result for weak adversaries 
in Section [5l 

As we compare the powers of different adversaries in the remainder of the paper, we will refer 
repeatedly to the following notion of equivalence: 

Definition 3.1. Let Ai and M' he two algorithms and A and A' he two adversaries. We say that 
{M.,A) and {Ai',A') are equivalent if for any coin flip vector c, there exists a sequential history 
that is a linearization ofT{H_x4 _^^^) and of r(Lr_A4',^',c)- 

Some of the results discussed in Sections [3] and [5] refer to well-known progress requirements. 
An implementation of a shared object type is wait-free if in any history, each method call incurs 
a finite number of steps. An implementation is lock-free if in any history, either each method 
call takes finitely many steps, or else infinitely many method calls complete. An implementation is 
terminating if in any history, either each method call takes finitely many steps, or else some process 
that takes finitely many steps invokes a method call that it does not complete. 

In this section, we discuss a novel technique for limiting the additional power a strong adver- 
sary may gain against an algorithm when atomic objects used by the algorithm are replaced with 
implemented objects. 

4 Strong Linearizability 

We define a correctness property stronger than linearizability, called strong linearizability, and prove 
that under any strong adversary, strongly linearizable implementations of shared objects preserve 
the probability space of computations of an algorithm using such objects. We also show that strong 
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linearizability maintains locality and composability — powerful properties that facilitate algorithm 
design. 

For a set of histories T-L, let close {H) denote the prefix-closure of T-L. That is, G G close iJ-L) if 
and only if there is a sequence, 5, of invocation and response steps such that G o S e H. (Recall 
that the operator o denotes concatenation.) Consider a function / that maps a set Ti of histories 
to a set Ti' of histories. We say that / is prefix preserving, if for any two histories G, H E T-L, where 
G is a prefix of H, f{G) is a prefix of f{H). 

Definition 4.1. A set of histories % is strongly lincarizable if there exists a function f mapping 

histories in close (Ti) to sequential histories, such that 

(L) for any H € close (Ti), f{H) is a linearization of the interpreted history T(H), and 
(P) f is prefix-preserving. 
A function satisfying properties (L) and (P) is called a strong linearization function for T-L. 

An implementation of a type is strongly linearizable if the set of histories formed by interpreting 
each history in the set of histories of the implementation is strongly linearizable. 

We emphasize some differences between the concept of linearizability and strong linearizability: 

1. In order to determine whether an implementation of a type is linearizable it suffices to look 
at every single history individually; However, property (P) from the definition of strong 
linearizability is defined for sets of histories, so we have to consider all possible histories 
together. 

2. Linearizability is defined in terms of interpreted histories. I.e., it does not matter how an 
object is implemented, as long as all possible sequences of high-level invocations and re- 
sponses satisfy the linearizability property. For strong linearizability the low-level (i.e., non- 
interpreted) histories have to satisfy property (P), so the implementation of the object seems 
to be more important. 

In the following we consider sets of histories H that are generated by a (fixed) strong adversary A 
for a given algorithm Ai . It will prove helpful to note that in this case it does not make a difference 
for the strong linearizability of H whether H is a set of low-level histories or of interpreted histories: 

Observation 4.2. Let A4 be an algorithm and A a strong adversary. Then T-L := 
{Hm,A,c \ c& 0°°} is strongly linearizable if and only ifViTi) = |r(iJ^ _4 g) | c G $7°°} is strongly 
linearizable. 

Proof. Assume w.l.o.g. that Ti is prefix-closed (and then so is r(?^)). First note that for any 
H,H' eH the following is true: 

If r{H) = T{H'), then either iJ is a prefix of H' or vice versa. (*) 

Let G be the longest common prefix of H and of H' . For the purpose of a contradiction assume 
that G is a proper prefix of H and of H' . Since H and H' are generated from the same algorithm 
M and the scheduling of a strong adversary, G must end with the invocation of a fiip operation fl. 
Hence, the response of that flip operation is in H and H' but its return value is different in these 
two histories. But since all object implementations are deterministic, fl must occur (and respond) 
in T{H) = T{H') — a contradiction. 
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Now suppose that T-L is strongly linearizable. Let / be a strong linearization function for T-L. For 
any history H' £ r(^) let max-^(i?') be the longest history in Ti with r(max-^(if')) = H' . (By (jlj) 
all histories H with T{H) = H' are prefixes of max-^(i7').) Define f'{H') = /(max-^(iJ')). Then /' 
is a strong linearization function of T (T-L): If G' is a prefix of H' , for G' ,H' G r(?^), then max-^(G') is 
a prefix of max-^(//') and so f'{G') = /(max-^(G')) is a prefix of = /(max-^(f/'')). Moreover, 
since / satisfies property (L), f'{H') is a linearization of r(max-^(//')) = Thus, /' satisfies 
properties (P) and (L). 

Now suppose that T{T-L) is strongly linearizable and that g \s a. strong linearization function for 
it. For each history H £ H we define g'{H) := g{T{H)). Then it is immediate that g' inherits 
properties (P) and (L) from g, so g' is a strong linearization function for Ti. □ 

4.1 Strong Linearizability is Necessary 

In the following we show that if the power of the set of (at most) strong adversaries is not enhanced 
by the implemented objects, then the algorithm using implemented objects generates a strongly 
linearizable set of histories. 

Theorem 4.3. Let Ai he an algorithm that uses only atomic objects, and let M' be the algorithm 
obtained from M by replacing some objects with linearizable implementations. Further, let A' be 
an adversary that is strong for Ai' . If there exists an adversary A that is strong for M such that 
{A,M) and {A' ,M') are equivalent, then H' = {^^a^',^^',? I cE ^2°°} is strongly linearizable. 

Due to Observation 14.21 it suffices to consider only interpreted histories when proving this 
theorem. Thus, in the following proof we only consider interpreted histories. For the ease of 
notation we simply write H instead of T(H) for every history H considered. 

Let Ti* = close (?^'). Since {A,Ai) and {A',J^') are equivalent, and all histories of Ai are 
sequential, each history H' = Hj^i j^i ^, where c G has a linearization i{H') = H^j^^. Let 
H' e n' and let G' e Ti* he a prefix of H' . Define g{G',H') to be the shortest prefix of i(H') that 
contains all operations that complete in G' . Two claims help clarify the proof of Theorem 14.31 

Claim 4.4. G = g{G' , H') is a linearization of G' . 

Proof. Suppose that op -<g' op' . Since G' is a prefix of H' , op -<hi op' holds, and thus op' -^n op 
for the linearization H = i{H') of H' . History G is a prefix of H, and so op' y^c op. 

By construction, G contains all completed operations from G' , and so it suffices to show that if 
G contains an operation, then that operation's invocation occurs also in G' . For contradiction let 
op be any operation in G such that inv{op) does not occur in G' . By construction, some operation 
op' must follow op in G, and so op' completes in G' (otherwise, G would not be the shortest prefix 
of H that contains all operations that complete in G'). However, since op -<g op' and G is a prefix 
of H, we know that op op' and thus op' -/^h' op. Hence, in H' rsp{op') occurs only after 
inv{op). But then, since G' is a prefix of H' that contains rsp{op'), inv{op) must occur in G' as 
well — a contradiction. □ 



Proof. The claim is trivially true if H'^ = H'^, so assume that H'^ ^ H'^. Suppose that the longest 
common prefix of c and d has length k. Since G' is a common prefix of H'^. and H'^, it cannot contain 



Claim 4.5. Let H'^ 

and H'^. Then g{G', 




= H 



A'M'4 



^&T~L' and G' G %* be a common prefix of both H'^ 
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the response of the {k + l)-th flip operation. Thus, G' is a prefix of both H'^[k + 1] and H'^[k + 1], 
and furthermore H'^[k + 1] = H'^[k + 1] since A' is a strong adversary. Let Gc = g{G',H'^) and 
Gd = g{G',H'^). We will show that neither G c nor Gd contains the (/c + l)-th coin flip. Suppose 
for contradiction that Gc does. (The proof for Gd is analogous.) Since the coin flip is not complete 
in G' , by construction of Gc some operation op that is complete in G' must follow the coin flip in 
Gc- Since Gc is a linearization of G' by Claim 14.41 the invocation of the coin flip must precede the 
response of op in G' . But that contradicts G' being a prefix of H'^[k + 1], which ends with the coin 
flip's invocation. Now since Gc does not contain the {k + l)-st coin flip, it is a prefix not only of 
^(i/^)butalsoof^(F^[A: + l]), and similarly is a prefix of ^(F^ [A; + 1]). Since + 1] = H'^[k + 1] 
holds, as noted earlier, this implies that Gc = Gd- □ 

Proof of Theorem \4-3\ For all histories G' G Ti* , we define f{G') = g{G',H'), where H' is an 
arbitrary history in Ti' such that G' is a prefix of H' - (By Claim H3l all such histories H' yield 
the same g{G' , H').) We show that / is a linearization function for 71'- By Claim H7il / satisfies 
property (L), so it suffices to show that it also satisfies property (P). Let F',G' £ Ti*, such that 
F' is a prefix of G' - Choose an arbitrary history H' G T-L' such that G' is a prefix of H' . By 
construction and Claim 14.51 f{G') and f{F') are the shortest prefixes of (.{H') that contain all 
completed operations in G' and F' , respectively. Since the set of completed operations in F' is 
a subset of the completed operations in G', f{F') is a prefix of f(G'), completing the proof of 
Theorem 14. 3[ □ 

4.2 Strong Linearizability is Sufficient for the Strong Adversary 

Under strong linearizability the strong adversary is prevented from using the outcome of the flip 
to schedule future events in such a way that they influence the order of past operations in a 
linearization, because, once a coin is flipped, the operations that precede the coin flip in the 
linearization are already determined. This is made precise in the following theorem, the proof of 
which appears later in this subsection. 

Theorem 4.6. Let Ai he an algorithm that uses only atomic objects, and let M' be the algorithm 
obtained from Ai by replacing some atomic objects with strongly linearizable implementations. 
For any adversary A' that is strong for Ai' , there exists an adversary A that is strong for A4, such 
that {Ai,A) and {Ai',A') are equivalent. 

The proof is postponed to a later section. 

4.3 Normalized Strong Linearizations. 

Let T-Lm,A denote the set of all interpreted histories that are generated by an algorithm Ai and the 
adversary A over all coin flip vectors. That is Hma = {^{Hm,A,c) I c G A natural way to 

try to prove Theorem 14.61 would be to apply the strong linearization function to each history in the 
set T-Lm',A' to obtain a set % of linearizations of T-Lm',A'- Then it would suffice to prove that there 
is a strong adversary A that can generate the histories in %. 

Unfortunately, this is not always possible. For example, consider an algorithm, Ai, where 
process p (respectively, q) executes the single operation opp (resp. opq) and process r first executes 
opr and then executes a flip operation, cf . Suppose a strong adversary schedules an implementation 
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of M so that, for coin flip i £ {0, 1}, it produces the history: 

Hi = inv{opp),inv{opq),inv{opr),rsp{opr),inv{cf j.),rsp{cf^, i),rsp{opp),rsp{opq) 

Let G' be the common prefix of Hq and H[ that ends with inv{cfj.). Define the function / on 
{G',Hl„H[} hy: 

f{G') = inv{opr),rsp{opr) 

/{Hq) = inv{opr),rsp{opr),inv{opp),rsp{opp),inv{opg),rsp{opg),inv{cfr),rsp{cf^, 0) 
f{H[) = inv{opr),rsp{opr),inv{cj\.),rsp{cfr, l),inv{opg),rsp{opg),inv{opp),rsp{opp) 

Then, according to Definition 14. 11 / can be extended to a strong hnearization function for {Hq, H[}, 
but the histories /{Hq) and f{H[) cannot both be produced by the same strong adversary. This 
difficulty is remedied by proving that whenever such a problematic strong linearization function for 
a set of histories Ti occurs, there is another strong linearization for Ti that avoids this problem. The 
idea is to move coin flips in f{H) to the earliest point possible, without violating the happens-before 
order of H. The following technical lemma makes this precise. 

Definition 4.7. Let % he a strongly linearizable set of histories. A normalized strong linearization 
function for Ti is any strong linearization function f* for % such that: 

(N) for any history H E close (Ti), if in f*{H) some flip operation of immediately follows some 
other operation op, then op cf . 

Lemma 4.8. For any strongly linearizable set % of histories there exists a normalized strong 
linearization function. 

Proof. Let / be a strong linearization function for Ti. Let H £ close (Ti) be an arbitrary history 
and let c/i,c/2,... be the flip operations in H, occurring in that order. We obtain f*{H) from 
f{H) as follows: 

1. Let H' = f{H). As long as the response of the last operation op in H' does not appear in H, 
remove op from H' . 

2. Remove all flip operations from H' . 

3. For 1 = 1,2,..., insert cfi at the earliest possible position in H' , where it doesn't violate -<h- 

Let f'{H) be the history obtained from f{H) after execution of Step 1. It is immediate from 
Step 1 that f'{H) satisfies the following: 

The response of the last operation in f'{H) occurs in H. (*) 

Moreover, /' satisfies property (L): If f{H) is a linearization of T{H) that ends with an operation 
op which does not complete in H, then we can remove op without destroying validity or changing 
the order of any of the remaining pairs of operations in f{H). Similarly, property (P) is maintained: 
Since removals from histories occur only at the end, such removals can only violate property (P) 
for a history H with prefix G, if an operation op is removed from f{H) and op occurs also in /(G) 
but is not removed from f{G). If that were to happen, then f{G) = f{H) since f{G) is a prefix 
of f{H) and op is the last operation of both histories. But if op is removed from the end of f{H), 
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then op does not complete during H. And then op does not complete in G, either, and so it is 
removed from f{G), too. 

Now suppose that /' satisfies (L), (P), and ((*j) after Step 1. We show that after Steps 2 and 3, 
the resulting mapping, which we call /*, satisfies (L), (P), and (N). 

First we show that /* satisfies (N): Suppose in Step 3, the flip operation cfi is inserted imme- 
diately after some other operation op. Then op -Kh cfi, or else cfi could have been inserted before 
op. For all flip operations cfj that are inserted after c/j, cfi -<h cfj holds, and so they are inserted 
into H' behind cfi. Thus, at the end of Step 3, cfi immediately follows op. 

Next we show that /* satisfies (L), i.e., that f*{H) is a linearization of T{H). Since f'{H) and 
f{H) differ only in the position of the (atomic) flips, and the relative order of flips is preserved, 
f{H) is valid. Now consider two operations op, op' that both appear in f*{H), and where op -<h op' . 
Then op and op' both occur in f'{H), and since /' satisfies (L), op '<f'(H) op'. If neither of the two 
operations is a fiip, then the order of them is preserved when we construct f*{H). If one of the 
operations is a fiip, then the insertion rule guarantees that the fiip operation is inserted in such a 
way that it does not violate -<h- Finally, only operations that don't complete in H are removed 
from f{H) in Step 1, so f'{H) and thus also f*{H) contain all operations that complete in H. 

It remains to show that /* satisfies (P). Let G be an arbitrary prefix of H and let O be the 
set of non-fiip operations in H. Further, let Oj = O U {cfi, . . . , cfi}, for < i < z, where z is the 
number of fiip operations in H. We show by induction on i, < i < z, that 

r(G)|0, is a prefix of f*{H)\Oi. 

First consider the base case, i = 0. Note that Oq = O. By construction, f*{G) contains exactly 
the same set of operations as f'{G), and f*{H) contains the same set of operations as f'[H). Since 
the order of non-fiip operations does not change during Steps 2 and 3, f*{G)\0 = f'{G)\0, and 
f*{H)\0 = f'{H)\0. Since /' satisfies (P), it follows that f*{G)\0 is a prefix of f*{H)\0. 

Now suppose i > 1 and that f*{G)\Oi-i is a prefix of f*{H)\0^-i. Let G* := f*{G)\0^ and 
H* = f*{H)\Oi. Below, we prove the following statement: 

for any operation op in G*: cfi ^h* op cfi ^g* op. (1) 

Thus, if cfi appears in both G* and H* , it appears in the same position. On the other hand, if cfi 
does not appear in G* , then cfi -/^g* op for all operation op that appear in G*; hence, in H* cfi 
is not inserted before some operation op that appears in G* . Therefore, from ([1]) we can conclude 
that G* is a prefix of H* , and so it suffices to prove ([I]). 

First suppose cfi -<g* op, and for the purpose of a contradiction assume cfi y^n* op. Since all 
operations in G* occur also in H* , op -<h* cfi. Let op' be the operation that immediately precedes 
cfi in H* . Then op ^h* op'H By the semantics of Step 3, op' -<h cfi, or else cfi would have 
been inserted in front of op' . Since cfi completes during G, and G is a prefix of H, op' completes 
during G as well. Hence, op' -<g cfi and since G* is a linearization of G\Oi, op' -<g* cfi. By 
induction hypothesis, from op :<h* op' , we have op <g* op' and thus by transitivity op -<g* cfi — a 
contradiction. 

Now suppose cfi ^h* op. For the purpose of a contradiction assume that cfi -/^g* op. Then 
either cfi does not occur in G* at all or op -<g* cfi. First consider the latter case. Let op' be the 
last operation in G* that precedes cfi, so op <g* op' . Since cfi was not inserted in front of op' , by 

^ We write a ^ 6 to denote "either a ^ 6 or a = 6" . 
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the semantics of Step 3 we have ojJ <u cfi. Since H* is a hnearization of H\Oi, op' <h* cfi. By 
the induction hypothesis, op op', and so by transitivity op -<h* cfi — a contradiction. 

Finahy, assume that cfi is not in G* . Let op' be the last operation in G* . If op' is a flip, then it 
appears atomic in G, and thus completes in G. Otherwise, op' is also the last operation in f'{G), 
and thus by ((*]), op' completes in G. On the other hand, since cfi is a atomic but does not appear 
in G* , it does not appear in G, either. Since G is a prefix of H, it follows that op' -<h cfi. But then, 
since H* is a linearization of H\Oi, op' -Kh* cfi. Since op is in G*, but op' is the last operation 
in G*, op :<G* op'. By the induction hypothesis, G*\{op,op'} is a prefix of H*\{op,op'}, and so 
op :^H* op' -<//* cfi — a contradiction. □ 

4.4 Strong Linearizability is a Local Property. 

We could now proceed with the proof of Theorem 14. 6i However, the proof is simplified by exploit- 
ing a locality property for strong linearizability. Herlihy and Wing proved the following locality 
property for linearizability [HW90]: A history H over multiple shared objects is linearizable if, for 
each such object O the history H\ O is linearizable. Thus, to establish that any result that can 
arise from an algorithm using implemented objects, is a result that could be produced by the same 
algorithm using atomic objects, it suffices to show separately for each shared object, O, that the 
implementation of any algorithm over O is linearizable. The analogous property for strong lineariz- 
ability is given in the following lemma. For a shared implemented object O and a history H, we 
denote by H\\0 the history H projected to all steps that any process executes while performing 
an operation on O, i.e., all steps executed by the process during an interval that starts with an 
invocation on O and ends with the matching response on O. 

Theorem 4.9. A set of histories % over implemented objects Oi, O2, ■ ■ ■ , On is strongly linearizable 
if for each object Oi, I < i < n, the set Hi = {{H\ \ Oi) \ H £ Ti} is strongly linearizable. 

Remark 4.10. Typically one would apply the theorem for objects Oi, . . . , On that are implemented 
from distinct sets Bi, . . . ,Bn of base objects. This is not required for the correctness of the lemma, 
though. However, suppose that n = 2 and the implementations of Oi and O2 share the same base 
object B. It is easy to construct a history H such that [H\\Oi)\B and [H\\02)\B are valid, but 
H\B is not. Or, it might be the case that H\B is valid, but {H\\Oi)\B is not. However, even if T-L 
contains such a history, the lemma is correct, because a strong linearization function f maps H to 
a history f{H) that does not contain any steps on B. 

Proof of Theorem \4.9\ Let H^'^^ denote the prefix of history H that has length min{A;, \ H\}, and 




(b) Z^^-* is a strong linearization function for 'H^'^'^; and 

(c) f^^\H)=f^^\H^^y); and 

(d) if A; > 1, then fC'-^^H) is a prefix of f^''\H). 
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For the basis, define f^^^H) = e, where e denotes the empty history. Properties (a) through 
(d) clearly hold for f^^\ Now consider k > 1 and suppose that f^^~^^ satisfies (a)-(d). Construct 
/('^) as follows. For any history \i\H\ < k then define f^^\H) = f^^~^\H). Otherwise let j3 
be the fc'th step of and suppose that /3 is a step in H\\Oj. Since fj is a strong linearization 
function for Uj, fj{H'-''-'^^\\Oj) is a prefix of fj[H^''^\Oj) . Consequently, there is a sub-history A 
(possibly empty) satisfying 

/,(i/«||0,)=/,(i/('=-i)||0,)oA (2) 
For this case (i.e., \H\ > k) define 

f^^\H) = f^^-^\H)oX (3) 

For all A; € IN U {0}, property (c) is satisfied because f^{H) is uniquely determined by the 
prefix of length minjfc, \H\} of H. Property (d) follows immediately from ([3|). We now show that 
properties (a) and (b) are preserved by the inductive step. 

Property (a): First consider an object Oi, where i ^ j. Since (3 and all events in A belong to 
operations on Oj, we get: 



/W(i/)|0, i (/('=-i)(i/) o a) |o. = /('=-i)(i/)|0. h{H(''-'^\0,] 
= 0/3)110,) =/,(i7W|0,). 

Now consider the object Oj. By definition of A, the induction hypothesis, and construction of /'•^•': 
/,(if(^)||0,) i /.{hC^-'Ho,) o a {f^'-'\H)\0,) o A = {f'^'-'HH) o a)|0, = f^'\H)\0,. 



Property (b): We first show that the projection of f^'^^ on 'H^''^ satisfies property (P). Let H be 
an arbitrary history of length at most k, and let G be a prefix of H. If |G| = k, then G = H so 
trivially f^''\G) is a prefix of f^^\H). Now consider the case |G| < A;, and thus G = G^'''^^ Then 
by construction, f^'^~^\G) = f^^\G), and so 

which by the induction hypothesis is a prefix of 



By (d), this is a prefix of f'^^\H). 

We now prove that f^^^ satisfies property (L) for every history H G 71^^^ Consider an operation 
op on some object that completes during H and thus also during T{H). Then op completes 
during T{H\\Oi), and also during its linearization fi{H\\Oi). Thus, by (a) op completes during 
f^^\H). Now consider two operations opi,op2 in T{H) such that opi -<h op2- For the purpose 
of a contradiction, assume that op2 appears before opi in f^^\H). First consider the case that 
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opi and op2 both appear in f^^~^\H). Since by (c) this history is equal to f(^~^)(^H^^~^^), we 
conclude from (b) that opi precedes op2 in f^^~^\H). But then by (d), this must also be the case 
in f'^^\H) — a contradiction. 

Now assume that not both of opi and op2 occur in f^^^^\H). Then by (d) and the assumption 
that op2 precedes opi in f^^\H), we know that opi does not appear in f^^^^\H). Hence, according 
to (131) we can write opi appears in A. By the induction hypothesis for 

(b), opi does not complete during H^^~^\ From the assumption opi -<h op2, it follows that op2 does 
not get invoked during H^^~^\ Hence, op2 cannot occur in f^^^^\H) = f^^^^)[H'^^~^'>) either or 
else f^^^^^ would violate (b). Therefore, opi and op2 must both have events in A. By construction, 
A is a sub-history of r{H\\Oi) for one object Oj. Hence, opi and op2 are both operations on the 
same object Oj and so opi -<H\Ot op2- On the other hand, by the assumption that op2 precedes opi 
in f^^\H) and since by (a) f^^\H)\Oi = fi{H'^^'^\\Oi) and since fi satisfies (P), op2 occurs before 
opi in fi{H\\Oi). But then fi{H\\Oi) is not a linearization of H\\Oi, contradicting the assumption 
that fi is a strong linearization function for T-Li. 

So we have proved that the functions f^^"^ satisfy properties (a) through (d). 

The final step of the proof is to use the functions /'■^•' to define the function / so that it is 
a strong linearization function for % (which may contain infinite histories). By property (d), for 
A; > 0, there is a sequence C('=)(if) satisfying f^^\H) o ((^){H) = (if). We use this sequence 

to define / as follows: 



fiH) 



j fm)(^H) if \H\ is finite, and 

j/(o)(if) o C(^\H) o cW{H) o (C^Xh) o . . . if |ii| = oo. 



(Note that if \H\ is finite, then f{H) = /(°)(i?) o C^^H^) ° • • • ° C^'^'H^))-) 

It now remains to confirm that / satisfies properties (L) and (P) of Definition 14.11 
If G is a prefix of length i of some history H £T-L, then 

f{G) = f^^\G) = f^^\H) = f^°\H) o C^^\H) o • • • o C^^\H) 

is a prefix of f{H). Hence, / satisfies property (P). 

To show that / satisfies property (L) consider an arbitrary history H ^ %. First note that 
every operation op that completes in T{H) completes in f{H), too: Let j be the position of the 
response of op in H. Then by (b) and (c), f^^\H) = f^'j\H^^'>) is a linearization of H^^\ By (d), 
f^^\H) is a prefix of f{H) that contains op, since op completes in r(ff(-')). 

Now suppose that f{H) is not a linearization of T{H). Then there is a finite prefix G' of f{H), 
such that either -<qi is not compatible with -<h, or G' is not valid. Let k be an arbitrary large 
enough integer such that G' is a prefix of f^''\H). By (c), f^''\H) = (if ('=)), and so by (b), 
f(^)(^H) extends -<fj{k) and is valid. Clearly, the same is true for every prefix of f{H^^^). Hence, 
G' is valid and extends -<ff(k), and so G' also is compatible with -<h. We conclude that f{H) is a 
linearization of H. □ 

4.5 Proof that Strong Linearizability is SufRcient for the Strong Adversary. 

We now have the tools to prove the core theorem concerning correctness under the strong adversary. 

Proof of Theorem\4^ Let V.' = {ifx',^',c I c G By TheoremSH since all object implemen- 

tations are strongly linearizable, Ti' is strongly linearizable. By Lemma 14.81 there is a normalized 
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strong linearization function, say /, for %' . For each coin flip vector c, let HL = Hj^i j^i^^ and 

= f{H'^). The set of all pairs {c,H^) with c G 0°° defines an adversary A, where A{c) is the 
schedule observed by H^. Then = i^x,^,^ for all c G r2°°, and and r(ffl) have a common 
linearization (namely Hg). Hence, {A4,A) and (A^',^') are equivalent. 

It remains to prove that A is strong for A4. That is, for any two coin flip vectors, c = (ci, C2, . . . ) 
and d = {di, d2, ■ ■ ■ ) , Ci = di ior < i < k implies H^[k + 1] = H^k + 1] . 

Since A' is a strong adversary, it follows that HL[k + 1] = H'J^^k + 1]. If HL contains fewer than 
k + 1 flip operations, then = HL[k + 1] = H'J^k + 1] = H'^, and so the claim holds. 

Now suppose that HL contains at least k + 1 flip operations. Let cf be the {k + l)-st flip in HL 
and inv{fl) its invocation. Since is a linearization of T{HL), and since coin flips are atomic, the 
{k + l)-st flip operation in is also cf, and inv{fl) is the last step in H^[k + 1]. 

Let G' and G be the prefixes of HL[k + 1] and Hg[k + 1] , respectively, each of which ends with 
the last step preceding inv{fl). Since G' is a prefix of Hi, property (P) for / (see Definition 14. ip 
ensures that f{G') is a prefix of /(HL) = H^. Thus G and f{G') are both prefixes of H^. We will 
now show that the last operation of G is in f{G') and the last operation of f{G') is in G, from 
which it follows that f{G') = G. 

Let op be the last operation in G. Then in H^, operation op immediately precedes the flip 
operation cf. Consequently, property (N) (see Definition 14.70 ensures that op cf, and so op 

c 

completes during G' . Since f{G') is a linearization of G' , op appears in f{G') as well. 

Now consider the last operation op' in f{G'). Since / has property (P) and coin flips appear 
atomic, f{G') is a prefix oi H = f{G' o inv{cf) o rsp{cf)). There can be no operation op" between 
op' and cf in H, because if there were, op" would not complete in f{G') and thus could not satisfy 
op" -<iji_^ cf, contradicting property (N) for /. Thus, op' immediately precedes cf in H, and hence 

c 

also in f(H'-,). Then by Definition 14.71 op' cf. Therefore, op' precedes cf also in f(H'-f) = 
and so op' appears in G. 

Thus, we conclude that f{G') = G, and hence f{G') o inv{fl) = G o inv{fl) = H^[k + 1]. By 
symmetry we also have f{G') o inv(fl) = Hi^k + 1], which completes the proof. □ 

4.6 Timed Executions, Linearization Points, and Composability 

A timed execution is a (possibly infinite) sequence of pairs E = ((si,ii), (52,^2); • • • ), where each 
Si is an invocation or response step and each ti is a number in IR satisfying 

• ^1 < ^2 < • • • , and 

• \iti = tj+i, then Sj+i is the response of an (atomic) operation with invocation Sj. 

The timed execution E corresponds to a history H{E) = (si, S2, . . .) together with a timing function 
tE '■ {si, S2, ■ ■ ■} — )• R, tE{si) = ti. We say that step Sj occurs at time t{si) in execution E. If 
an operation op is atomic in H{E), then we say that op occurs atomically at time tE[inv{op)^ . 
Every operation op of the timed execution E can be associated with an interval Ie{op) C IR,, 
where Ie{op) = \tE{inv{op)) ,tE{j'sp{op)Y\, if rsp{op) occurs in E, and Ie{op) = \t E{inv {op)) ,oo\, 
otherwise. 

For a timed execution E = {{si,ti)^ ^^^^ let r(£') denote the interpretation of E, i.e., T{E) 

is the timed execution that contains only the pairs {sj,tj), where Sj £ T{H{E)). Define ^{H) to 
be the set of operations whose invocations occur in history H, and for a timed execution E let 
<^{E) = <^{H{E)). 
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Now consider an arbitrary timed execution E and the corresponding history H = H(E). It is 
well-known that a valid sequential history L is a linearization of T{H) if and only if every operation 
op G <&(r(£^)) can be mapped to a point pt(op) G H U {oo}, such that for any op G $(r(£')) and 
any distinct opi,op2 G ^{L): 

(a) pt{op) G Ie{op), and 

(b) if opi :<L op2, then pt{opi) < pt{op2). 

If pt : $(r(£^)) — )• RU {00} is a mapping satisfying both, (a) and (b), then we say that pt maps 
the operations op G $(r(£')) to their linearization points. Note that by (a) all linearization points 
pt{op), with pt{op) < 00, are distinct. If pt maps the operations in T{E) to their linearization 
points, then we denote by L{E,pt) the timed execution, where every operation op G ^{E), op < 00, 
occurs atomically at time pt{op). 

Two timed executions E,E' are isomorphic, if H{E) = H{E'). For a set S of executions, let 
close {6) the set of all executions E' that are isomorphic to a prefix in 6. 

Definition 4.11. Let £ be a set of timed executions and for every execution E G close {£) let a 

mapping ptE : $(r(i?)) — R U {00} he given. We say that the mappings ptE, E G close{£), map 
the operations of E to their strong linearization points, if for all D,E & close (£) 

[L') ptE maps the operations in T{E) to their linearization points, and 

{P') if D is a prefix of E then L{D,pt£)) is a prefix of L{E,ptE). 
In this case, £ is called strongly linearizable. 

It is immediate that we get the following characterization of strong linearizability of sets of 
histories: 

Lemma 4.12. Let £ he a set of timed executions and % = {H{E) \ E e £}. Then % is strongly 
linearizable if and only if £ is strongly linearizable. 

Proof. Assume w.l.o.g. that £ = close {£); thus Tf. is prefix-closed. First suppose that £ is strongly 
linearizable, i.e., there exist mappings ptE, E G £, that map the operations in E to their strong 
linearization points. For each history H ^ %, let Eh be the timed execution with H{Eh) = H, 
where the z-th step of H occurs at time i. (Such an execution Eh exists because £ = close {£) is by 
definition closed under isomorphism.) Then Eh G £ and wc can define f{H) = H (^L(^EH,ptEH)\ 
By {L'), f{H) is a linearization of r(^H{EH)) = ^{H)- Moreover, if G is a prefix of H, then by 
construction Eq is a prefix of Eh, and so from (P') we immediately get that f{G) is a prefix of 
f{H). Hence, / is a strong linearization function for V.. 

Now suppose that T-L is strongly linearizable and let / be a strong linearization function of T-L. 
Consider an arbitrary timed execution E ^ £, let if = H{E) be the corresponding history, and 
let tE be the corresponding timing function. (I.e., if £^ = ((si, ii), {s2, ^2); • • • ) , then tE{si) = ti.) 
Further, let k = \ f{H)\ G WU{0, 00}, and let opi, 1 < i < A;, be the i-th operation in f{H). Finally, 
for any point time t G R let T*{f) = {t + t!)l2, where t' > t \s the point in time of the first step 
that occurs in E after time t, and T*{t) =t + \ if no step occurs in E after point t. 

We inductively define 

ptE{opi) = tE{inv{opi)), and 

ptsiopi) = max {tE{inv{opi)), T* (ptEiopi-i))} ioi 1 < i < k. 
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For every operation op G $(r(£^)) — we define ptE{op) = oo. 

We first prove property (L'). From the definition of T* we immediately have ptE{opi+i) > 
ptsiopi) for all 1 < i < /c. Thus, it suffices to show that ptE{opi) G lE{opi) for 1 < i < /c. Since 
by construction tE{inv{opij) < ptE{opi), we only have to show that ptE{opi) < tE{rsp{opij) . We 
prove by induction on i for all 1 < i < A; the following statement: ptE{opi) < tE{rsp{opi) and 
ptE{opi) < tE{rsp{opj)^ for all i < j < k. For i = 1 the statement is true since ptE{opi) = 
tE{inv{opi)) and any response step in E occurs no earlier than tE{inv{opi)) , and only rsp{opi) 
can occur at the same time as inv{opi) (if opi is atomic in E). Thus, let 1 < i < j < k. Since f{H) 
is a linearization of T(^H{E)^, we know that in E the invocation of opi occurs no later than the 
response of opj, and they can occur at the same time only if i = j (and opi is atomic). Therefore, 
tE{inv{opi)) < tE{rsp{opj)) , and equality holds only ii i = j. Hence, if ptE{opi) = tE{inv{opi)) , 
we are done. So assume that tE(inv{opi-i)^ < ptE{opi). By the induction hypothesis, and since 
J > z — 1, we have ptE{opi-i) < tE{rsp{opj)) , and then from the definitions of ptE and T*, we get 

We now show (P'). Consider two arbitrary timed executions D,E G £ such that L> is a prefix 
of E. Let f[H{D)) = {opi, . . . ,opk)- Since / is prefix preserving, opi, . . . ,opk are the first k 

operations in f(^H{E)^. It is immediate from the inductive construction that ptoiopi) = ptE{opi) 
for 1 < i < k, and ptE{op) > ptE{opk) = ptoiop) for all operations op € f(^H{E)) — {opi, . . . , opk}- 
Thus, L[D,pt£)) is a prefix of L{E,ptE)- □ 

Theorem 4.13 (Composability). Let B and B' he disjoint sets of base objects. Further, let O 
be a strongly linearizable implementation of some type T that uses only base objects in B. Also, 
for some atomic object B e B let B' be an object of the same type as B that is implemented 
from objects in B'. Then, the implemented object O' obtained from O by replacing object B with 
B' is a strongly hnearizable implementation of type T that uses atomic base objects from the set 
{B-{B})UB'. 

Proof. Let Sq'i ^Oj ^''^^ ^B' be the set of all timed executions that can occur if processes execute 
operations on O' , O, and B', respectively. By the assumption, Sq and S^' strongly linearizable. 
Let pt'^ for E G Sb' and pt'[) for D E So denote mappings that map executions in Sb' and So, 
respectively, to their strong linearization points. 

Now consider an arbitrary timed execution E = ((si,ti), . . . ,) G So'- Then E\B' £ S^'-, and 
thus every operation op G E\B' is associated with a strong linearization point pt'^^^,{op). For 
every such operation op executed by some process p during E, we do the following: We remove all 
steps from E, that a process p executes during the interval that starts with inv{op) and ends with 
rsp{op), and if pt'^^^,{op) < oo, then we replace those steps with the atomic operation op at time 
pt'^^g,{op). Thus, in the resulting execution, ol[E), all operations op on object B' are atomic. Also, 
a{E)\B' = L{E\B' ,pt'j^^^,), and so H{a{E)\B') is a linearization of T{E\B'). Moreover, all steps 
that are not on B' occur at exactly the same time in E as in ol{E), and very process executes its 
steps in program order. 

Thus, a{E) is a timed execution in So, and we can define the mapping ptE : '^{T{E)) E.U{oo} 

by 

ptE{op) =pt'^^j^^{op). 
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All invocation and response steps on O appear at exactly the same time in E as in a{E), i.e., 
T{E) = r(a(£^)). Since pt"a(^E)^^P) maps the operations in r(a(£')) to their linearization points, it 
is obvious that ptE satisfies property (L'). 

It remains to show property {P'). Let D,E £ Eq'^ ^'^^ l^t D be a prefix olE. Then D\B' is a, pre- 
fix of E\B' , and since pt'^^t and pt'^^^i map the operations in D\B' and E\B' , respectively, to their 
strong linearization points, a{D)\B' = L{D\B' ,pt'^^^,) is a prefix of a{E)\B' = L{E\B' ,pt'^^^,). 
As a consequence, a{D) is a prefix of a{E), and so L{a{D) , pt'^^^^) = L{D,pt£)) is a prefix of 
L{a{E),ptl^j,^) = L{E,ptE). □ 

4.7 Which Linearizable Implementations are Strongly Linearizable? 

Many linearizable implementations of shared objects from the literature are actually strongly lin- 
earizable, and hence can be used safely in randomized algorithms, in place of their atomic counter- 
parts. We can identify some of these by examining the proof of linearizability of the implementation. 
Such a proof typically follows one of a few general approaches. Let O be the implemented object 
and let -?/ be a history over O. 

In one proof technique, for each operation op on O a unique "linearization point" pt{op) is 
assigned, which is a shared memory operation that occurs during the execution of operation op. 
A sequential history S is formed by ordering the operations on O in so that opi precedes op2 
in S if (and only if) pt{opi) -<h pt{op2)- The construction of S guarantees agreement with the 
"happens before" order of operations on O in H, and so the proof obligation for linearizability 
is only to show that S is valid for O. If the linearization points are chosen so that the mapping 
from H to S also satisfies property (P) (see Definition 14. ip . then the implementation is strongly 
linearizable. Property (P) will hold, for example, if, for any shared memory operation op' in the 
software method that simulates op, by the time op' is executed it is determined whether or not op' is 
the linearization point for op. This holds, for instance, ii pt{op) is mapped statically to a particular 
pseudo-code statement (that performs exactly one shared memory operation), irrespective of the 
schedule. 

Several published implementations admit such proofs of strong linearizability, for example an 
obstruction-free double-ended queue |HLM03] and a terminating Compare-And-Swap implemen- 
tation from atomic registers [GHHWOT] . The practical constructions of sets and lists such as the 
"FineList", "OptimisticList" , and "LazyList" described by Herlihy and Shavit [HS08J use fine- 
grained locking and are strongly linearizable for the same reason. For non-blocking constructions 
of sets, there typically remains a point in the code (often a strong synchronization operation) associ- 
ated with each implemented "insert" or "delete" operation, which allows the proofs of linearizability 
to extend to proofs of strong linearizability (see for example the LockFreeList in [I IS()8 ) . 

Many universal constructions also fall into this category because they force operations to take 
effect one at a timeU For example, Herlihy's universal construction for wait-free shared objects 
|Her91| is strongly linearizable. In this construction, each operation applied on the shared object is 
represented using a "cell" data structure. Processes cooperate to thread cells onto a list; they reach 
agreement on the successor of each cell through a consensus object for that cell. The linearization 
corresponding to the total order of the cells in this list defines a strong linearizationlf] 

^ In that case the linearization order follows easily from the algorithm and a formal proof of linearizability is often 
not given. 

* If the implementation is non-deterministic and the operation corresponding to the last cell in the list is not 
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Similarly, any implementation of a shared object obtained by wrapping a mutex around a 
sequential implementation is strongly linearizable. Combining this observation with known imple- 
mentations of mutual exclusion using only safe registers, yields strongly linearizable implementa- 
tions of any shared object using only safe registers. Strongly linearizable implementations with 
0(1) Remote Memory Reference (RMR) complexity are also possible using a queue-based mutex, 
which can be constructed using atomic registers and certain Read-Modify- Write primitives such as 
Fetch-And-Add or Fetch-And-Store. 

In the second general proof technique, the operations in a history H are first ordered somehow 
into a valid sequence S, and the proof obligation is to show that S is consistent with the "happens 
before" order of H. Such proofs typically do not directly translate to proofs of strong linearizability, 
and the corresponding implementations are frequently not strongly linearizable. Two such examples 
are in the introduction. Additional examples of implementations in this category include other 
"textbook" wait-free constructions of strong atomic registers from weaker ones. For instance, any 
strong adversary has less power against a randomized algorithm using atomic MRSW registers, than 
a particular weak adversary has against the same algorithm when the MRSW registers are replaced 
with Israeli and Li's linearizable construction from SRSW atomic registers [IL93]. Similarly, any 
strong adversary has less power against a randomized algorithm using atomic MRSW registers, 
than a weak adversary has against the same algorithm when the MRMW registers are replaced 
with Vitanyi and Awerbuch's linearizable construction from MRSW atomic registers |VA86] . The 
impasse is not just for register-like constructions. Herlihy and Wing [HW90j provide a linearizable 
implementation of a queue object using some read-modify-write objects where the enq operation 
but not the deq operation is wait-free. There is no strong linearization even for the subset of 
histories that occur when DEQ is constrained to be atomic. (Examples of all these situations are 
included Appendix lAl) 

4.7.1 Strong Linearizability of CAS Implementation from Atomic Registers 

We now sketch a proof of strong linearizability for the RMR-efiicient implementation of Compare- 
And-Swap from atomic registers fGHHWOT]. First, we describe briefly how this implementation 
works, letting O denote the implemented CAS object. The basic implementation supports Read 
and CAS operations, and is presented in a simplified format in Figure[3l (A Write operation can be 
implemented in a strongly linearizable way using similar techniques |GollO] .) The implementation 
records the state of O using data structures called blocks. Each block contains a variable Val that 
records a state (i.e., value) of O. (For any block, Val inside that block is written at most once.) 
The state stored in block b is denoted b.Val. The block that stores the latest state of O is called 
the current block, and its address is stored in a shared register Cur. An operation that does 
not change the state of O simply reads Cur to identify the current block, and reads the state of 
O from a shared variable in that block (lines [iHll and lines [T8l419p . An operation that causes a 
non-trivial state change allocates a new block (line [8]), writes the new state to that block (line [9]), 
and writes the address of that block to Cur (line llOp . At the heart of the implementation lies a 
mechanism for efficient synchronization among operations that attempt to apply non-trivial state 
changes concurrently (line[6|). This mechanism ensures that at most one such operation succeeds 
and the others instead apply trivial state transitions. An efficient signaling mechanism is also used 
between the leader and losers, as we explain shortly (lines [TTl I14p . 

complete, then that operation is dropped from the linearization because its response may not be uniquely determined. 
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Declarations for CAS implementation. 



Shared variables: (global) 

Cur - stores address of current block, initially points to a block 
that records the initial value of the CAS object 
Shared variables: (per-block) 

Val - records a state (i.e., value) of the CAS object 
Private variables: (per-process) 

6, h' - block address 

I - process ID 

V - value of CAS object 



Procedure for operation CAS(X,y) 




Procedure for oper- 






Read (Cur) 




ation Read() 


1 h 








2 V 




READ(6.FaZ) 




18 b ■— Read( Cur) 


3 if 


^ X then 




19 return Read(6. Va/) 


A 


rfit.nrn ii 






5 else 






6 


I 


:= leader computed using leader election instance 






associated with block h 




7 


if I = this process then 




8 




h' := new block 




9 




Write b'.Val := Y 




10 




Write Cur ■= b' 




11 




signal value Y to losers of leader election 




12 




return X 




13 


else 




14 




wait for signal from leader I 




15 




return value Y signaled by leader 




16 


end 




17 end 







Figure 3: Implementation of CAS from reads and writes. 
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We now illustrate how the implementation works with a short example. Let sq, si, S2, ■■■ denote 
the sequence of states O takes on (ignoring trivial state transitions), and let 6o) ^i; ^2, ••• denote the 
blocks that store these states. (The state of a CAS object is simply the value it stores.) Suppose 
that the current block is Cur = 62 and processes Pi,P2,P3,P4: invoke the following operations 
concurrently: pi,P2 and ^3 invoke CAS(s2,^i), CAS(s2,X2) and CAS(s2,X3) respectively, where 
S2 {Xi, X2, X^}; and pi invokes CAS{Y, Z) for some Y {s2, -'^i, -^^2, -'^s} and arbitrary Z. 
Process p^s operation is dealt with easily; upon discovering that the state S2 stored in block 
Cur = 62 is different from Y, p^s CAS(y, Z) does not change the state of O and returns S2 
(lineHI). In contrast, each of pi,P2,P3 has a chance to change the state of O, although only one 
of them may actually do so. Thus, Pi,P2,P3 elect a leader, say p2, which allocates a new block b 
and records X2 in it (lines [HHS]) . (A separate "instance" of leader election is used to synchronize 
each non-trivial state change, and we associate such instances with blocks on a one-to-one basis. 
In this example, Pi,P2,P3 use the instance associated with block 62-) Process p2 then sets Cur = b 
(line [T0|) and its CAS(s2,^2) returns S2 (line [T2]) . Thus, b-3 = b and S3 = X2 hold, and the state of 
O changes from S2 to S3. Meanwhile, pi and p^ wait until p2 has written Cur (line I14p . and their 
operations return S3 (line I15p. (The linearizability of the implementation depends crucially on pi 
and p3 waiting for p2 in this situation.) Thus, their ("failed") CAS operations cause trivial state 
transitions from S3 back to S3, and appear to take effect just after p2S ("successful") CAS. 

To show strong linearizability, we will use timed executions (see Section [4.6p . For any history 
H of the implementation, define a corresponding timed execution E where the i'th step occurs at 
time i. Let £ denote the set of such timed executions. For any E £ £, we define the mapping 
ptE : ^'(r(^;)) ^ RU {00} as follows. If op is a Read operation that invokes a read of Cur at 
time t, then ptE{op) = t, otherwise op is pending in E and ptE{op) = 00. If op is a CAS(X, X) 
operation for some X (i.e., the comparison value is the same as the new value), we treat it just 
like a Read. If op is a CAS(X, y) operation for some X and Y ^ X, then ptE{op) depends on 
the execution path of op. If op is pending and does not read Cur at all, then ptE{op) = 00. If op 
invokes a read of Cur at time t, and the state of in the block read is not X, then op is a "failed" 
CAS and ptE{op) = t. If op invokes a read of Cur at time t, and the state of in the block read 
is X, then op may succeed or fail depending on the outcome of leader election (i.e., the "instance" 
of leader election associated with the block whose address op reads from Cur). If the outcome of 
leader election is not "decided" at the end of E (i.e., different extensions of E may lead to different 
outcomes), or the outcome is decided but the leader's operation has not yet overwritten Cur, then 
ptE^op) = 00. (In this case op is pending because the implementation ensures that the leader writes 
Cur before op terminates, even if op is not the leader's operation.) Otherwise the write to Cur by 
the leader is invoked at some time t. If op is the leader's operation then ptE{op) = t. Finally, if op 
is not the leader's operation, then ptE{op) = t + e where < e < 1 is an arbitrary constant unique 
for each process. 

It is straightforward to verify that for each E £ £, L{E,ptE) is a linearization of T{E). Next, 
consider Definition 14. Ill Property (L') follows easily for any operation op whose linearization point 
is the time of a step that op itself takes, which is always between the times of inv{op) and rsp{op). 
In all remaining cases, the linearization point is of the form t + e where t is the time of a step 
that is not part of op but nevertheless occurs after inv{op) and before rsp{op). Since the times 
at which steps occur are integer- valued by construction of E, and since < e < 1, it follows that 
t + e is also between the times of inv{op) and rsp{op), as wanted. For property {P'), consider 
histories D,E £ £ such that D is a prefix of ii^. It suffices to consider the case when E extends D 
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by one step, say s at time t, and show that L{D,ptD) is a prefix of L{E,ptE)- If s is not a step 
that reads or writes the shared variable Cur, then L{E,ptE) = L[D,pto). If s reads Cur and is 
part of some operation op, then either ptE{op) = oo and L{E,ptE) = L{D,ptE) (i.e., op is pending 
and trying to cause a non-trivial state change), or else L{E,ptE) is an extension of L{D,ptE) by 
an invocation/response pair for op (i.e., op causes a trivial state change). Finally, if s writes Cur 
and is part of operation op, then L{E,ptE) is an extension of L[D,ptD) by an invocation/response 
pair for op, followed by zero or more invocation/response pairs for operations whose linearization 
points are of the form t + e, < e < 1. 

5 Weak Adversaries 

In this section we show that it is impossible to strengthen linearizability in a way that limits the 
power of a weak adversary to influence the result of an execution in the same sense as strong 
linearizability limits the power of the strong adversary, when implementations are obtained only 
from atomic registers and load-linked/store-conditional. To prove this, we consider a particular 
algorithm that uses shared objects of a "strong counter" type. The state of this type is an integer 
(initially 0) and the operations supported are fetch&inc() and fetch&dec(). These operations 
increment and decrement the counter, respectively, and also return the prior value of the counter. 
In the executions we will consider, operations will be invoked in such a way that the counter's value 
is always in [0, n] where n is the maximum number of processes. 

In Figure IH we present a simple algorithm that uses ^/n strong counters. (Throughout this 
section we assume that n is a perfect square.) Each process chooses one of the counters uniformly 
at random, then calls fetch&inc(), and flnally calls fetch&dec(). Now fix a weak adversary A 
and an integer K^ax- Let H he a, random history obtained by a run of loadBalance() scheduled 
by A. For each process p let be the random variable defined as follows: If during H the 
maximum point contentiorH is at most K^ax and p's FETCH&INC() terminates, then Xji^^p is the 
value returned by that fetch&;inc(). If p's fetch&inc() does not terminate during H, or if the 
point contention exceeds K^ax, then define Xp = 0. We are interested in the maximum expectation 
of ^^,p over all processes, i.e., 

<^{A) := max{E [Xp,j,] IpeV}. 

It is not hard to see that if the fetch&inc() operation is atomic, then for all weak adversaries A, 
^{A) < i^max/\/ra. In particular, if Kmax = &iVn), then <I>(^) = 0(1). 

On the other hand, we will show that if the fetch&inc() operations are based on a "natural" 
terminating (or lock-free) implementation that uses only read, write, and LL/SC operations, then 
for -ftTmax = Q{\/n) there exists a weak adversary A such that ^{A) = r2(i^'max) = i^{\/n)- An 
implementation of a type r is natural if each instance O of the implemented object has its own 
set Bo of base objects such that an operation on the instance O accesses only base objects in Bq- 
Essentially all linearizable implementations of objects are natural, as otherwise the composition of 
multiple objects would effectively create a single new object. 

Strong counters can be implemented with various progress properties from atomic read, write, 
and LL/SC operations. For example, a deterministic wait-free implementation can be obtained using 

^The maximum point contention is the maximum number of processes that at the same point in time have caUed 
loadBalance() but not yet finished that function call, where the maximum is taken over all points in time. 
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Function LoadBalance() 

Shared data: ^Jn shared strong counters, -Fq; • • • ) 
1 Choose i G {0, . . . , ^/n — 1} uniformly at random. 
1 X := Fj.FETCH&INC() 

3 Fj.FETCH&DEC() 

4 return x 



Figure 4: The load balancing algorithm. 

Herlihy's universal construction [Her91| . with consensus objects simulated in a straightforward way 
from LL/SC. Given only read and write, a deterministic terminating implementation is possible 
using Yang and Anderson's mutual exclusion algorithm |YA95j . (Alternately, one can simulate 
LL/SC in the wait-free implementation using read and write |GHHW07] . which yields a terminating 
strong counter implementation from read and write only.) All these implementations are natural. 

We now present our main result, stating that a weak adversary gains additional power against 
algorithm LoadBalance no matter how the strong counters are implemented from read, write, 
and LL/SC. 

Theorem 5.1. Let Kmax = [(1 + S)^Jn\, b > 0. If the algorithm in Figure\M (LoADBALANCEj 
is used with atomic strong counters, then for any weak adversary A, ^{A) < 1 + 6 = 0(1) holds. 
On the other hand, if the algorithm LoadBalance is used with a linearizable, natural, possibly 
randomized, terminating (or lock-free) implementation of strong counters from atomic read, write, 
and LL/SC operations, then there exists a weak adversary A, such that ^(A) = ^}{Kmax) = ^i\/n). 

The full proof of the theorem is quite complicated and given in the extended version of this paper. 
Here we sketch the main idea. 

Proof sketch. For the upper bound, observe that at the point in time when a process p makes its 
random choice, the expected value of the counter it chooses is at most k/^/n, where k is the number 
of processes currently active. Since the weak adversary cannot intervene between p's random choice 
and p's FETCH&:lNC(), that operation's return value is at most {k — l)/^/n. 

For the lower bound, we construct for each process p a weak adversary Ap that tries to "fool" p. 
We fix the coin flips that processes receive arbitrarily. Then we choose one process p at random, use 
Ap for the scheduling, and obtain that the expected return value of p's fetch&;INc() is ^l{k), where 
k is the number of processes that randomly chose the same counter as p. Also, point contention 
is at most k + 1. If coin flips are uniformly random, then k is highly concentrated around -y/n, so 
with high probability it will be ^l{^/n) but also not exceed Kmax- 

So the main goal of the adversary Ap is to make p's fetch&inc() call return a value with 
expectation Q{k). We achieve this as follows. First, we let process p take one step, which reveals 
the index i* of the counter it is using. Then we schedule each process q ^ p, one after the other. 
If q accesses a different counter than the one chosen by p, we let q run solo until it finishes the 
algorithm (after that q does not contribute to point-contention anymore). If q also chooses the 
counter with index i* , then we let it take exactly one step and then stall it. This way, eventually 
all k processes that chose the same counter as p are stalled, while all other processes are finished. 
Moreover, the maximum point contention encountered is at most A; + 1. 
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Now we partition the set of stalled processes into three sets. Let op^ be the first (and only) 
operation process q executed so far. The set Q contains all processes q, where op^ is not a write. 
The set V contains all processes q, where op^ is a write to a register R to which no other process 
writes in its first step. Finally, W is the set of all remaining processes, which write to some register 
that is also written by another process. Note that the sets Q, V, and W are uniquely determined 
by the first steps executed by all processes, and thus by the coin fiips processes use. But they are 
independent of the choice of p. 

We make use of a positive correlation between the size of each set, and the probability with 
which p is from that set. Suppose Q is large, i.e., \Q\ > k/3. Then the probability that p G Q is 
at least 1/3. Moreover, given that p £ Q, p's first operation opp is not a write, and thus leaves no 
"trace" (note that if it is a SC, then it fails). In this case, we stall p and let all other k — 1 processes 
finish their i^j* .FETCH&INC() operation. After that Fi* has value k — 1, so that when we finally let 
p finish its FETCH&INC(), its return value is k. 

Now suppose |V| > k/3, so the probability that p G V is at least 1/3. If p £ V, we let all 
processes in V run in a round-robin fashion in some predetermined order, until they have finished 
their algorithm. It can be argued that the choice of p among processes in V has no influence on 
what processes in V observe in the resulting execution. Hence, the fetch&inc() operations of 
all processes in V return distinct values. Given that p £ V is chosen uniformly at random, p's 
FETCH&INC() return value has an expectation of at least (|V| — l)/2 = n{k). 

Finally, suppose |W| > k/3. If p G W, then opp is a write that was overwritten by the operation 
opq of some other process q (recall that p's write was scheduled before any other process took a 
step). Moreover, since the first operation of all other processes in V U W is a write, none of these 
processes can have "seen" p before p was overwritten. Any process that may have seen p cannot 
have executed a write in its first step, so it didn't become "visible" itself. We let all processes in 
V U W — {p} finish their fetch&inc call by scheduling them in a round-robin fashion. They can 
only see themselves, and not any process outside of this set, so they will increase the value of the 
counter to at least | V U W| > k/3. When we run p afterward, its fetch&inc() call must return a 
value of Q{k). □ 

For the complete proof of Theorem 15.11 we rely on Theorems 15.21 and 15.31 below. 

Theorem 5.2. If Algorithmic (LoAoBALANCEj is used with atomic strong counters, then for any 
weak adversary A and any integer K^ax 

Proof. Let m = \/n and let A be an arbitrary weak adversary. Fix an arbitrary process p. Since 
^A,p is if > i^max; it suffices to show that E [X_4^p | ^] < {K — l)/m. 

Consider the system configuration immediately before p makes its random choice in line [TJ 
Suppose that in this configuration the counter Fj,0 < j < m, has value bj. Then clearly there are at 
least 6o+' ■ •+bm-i + f processes active (including p) . Thus, given K, we have 6o+' ■ •+fem-i + l < K. 
Hence, when p makes its random choice, the expected value of the counter chosen by p is 

V- 6,- K -1 



m m 

0<j<m 



Since the adversary cannot intervene between p's random choice and p's following fetch&inc () 
operation, the expected return value of that operation is at most (K — l)/m, as wanted. □ 
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Next, we consider the case when Algorithm H] is used with hnearizable implementations of strong 
counters from atomic read, write and LL/SC operations. Our analysis applies to any terminating 
or lock-free (hence wait-free) implementation of strong counters from read, write and LL/SC that 
is natural. 

Theorem 5.3. Let i^max = [(1 + ^)v^l ; ^^r some arbitrarily small 6 > 0. Consider Algorithm\^ 
(LoAoBALANCEj used with a hnearizable, natural, possibly randomized, terminating (or lock-free) 
implementation of strong counters from atomic read, write, and LL/SC operations. Then there 
exists a weak adversary A, such that 

To prove Theorem 15.31 first we specify the adversary A (in Subsection 15. ip . and second present 
the analysis that bounds ^{A) from below (Subsection 15. 2p . 

For the purpose of the following definition, it is convenient to assume w.l.o.g. that registers 
store pairs of values, where the second component of such a pair stores which process changed the 
register last. We can achieve this by having processes write their ID into the second component of 
the pair with each successful write or SC operation. We say, process q marks the register with its 
ID. If later on some other process z ^ q writes to the same register, or performs a successful SC 
operation on that register, g's mark gets replaced by z's mark. 

We say that in some configuration C process p is visible, if in C some register is marked by p. 
During a history H a process q sees process p ^ q, ii 

• either q reads or performs an LL operation on a register marked by p, or 

• q executes an SC operation on some register R, at a point in time when R is marked by p, 
and after q has executed an LL operation on R. (If this happens, then q's SC operation fails.) 

Observation 5.4. If H is a history where no process in P sees a process in P, then H\P is 
indistinguishable from H to all processes in P. 

5.1 Specification of the Adversary 

First we describe for each process p some weak adversary Ap that is trying to "fool" the process p 
For the analysis we will then choose p at random. 

Adversary Ap, p £ V, is defined as follows: We let p take steps until p has performed its first 
shared memory access (during its fetch&inc() operation). Note that this access reveals the index 
i* of the counter Fi* that p chose. After that we stall p. 

Next we consider each of the remaining processes q G V — {p}, one after the other, in the 
order of their IDs. We let q take steps until it has executed its first shared memory access and 
thus revealed its choice of a counter Fi^. If ig i*, we let q run solo, until it has finished its run 
of LoadBalance completely. Otherwise, we stall q. This is continued until each processes in V 
either has finished its LoadBalance algorithm or is stalled. 

Let H' be the execution obtained so far, and C the system configuration at the end of H'. 
Further, let Vj, < j < m, denote the set of the processes that selected the counter j during H' . 
Note that in H' , all processes in V — Vi* finish their LoadBalance() call, and all processes in Vi* 
(including p) have executed exactly one shared memory access. We distinguish two cases. 
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Case 1: In configuration C, p is visible. Let W be the set of processes in Vi* whose first 
shared memory access was a write. Note that among all processes in Vi* , only processes in W could 
have changed the value of a shared register during H' (all SC operation must have failed, because 
they weren't preceded by a LL). Hence, since p is visible in C, p G W. We let all processes in W, 
including p, take steps in a round-robin way, ordered by their IDs. Since no process in W can ever 
see a process not in Vi*, all those processes eventually make progress and finish their FETCH&:lNC() 
call. When a process finishes its fetch&inc() call, it is stopped and the scheduling continues with 
the remaining processes in W, until all processes are stopped. 

Case 2: In configuration C, p is not visible. Although p is not visible in configuration C, 
it is possible that some processes see p during H' . This can happen, for example, if p writes to 
a register R, then some process q reads R, and finally a third process writes to R, overwriting 
whatever p has written. Let S be the set that contains p and all processes q ^ p that see p during 
H' . We let all processes in Vi* — S take steps in a round-robin way, ordered by their IDs. Since 
each process q & S — {p} sees p when q executes its first shared memory operation, that operation 
must be either a LL, a read, or a failed SC. Hence, no process in S is visible in configuration C 
or has been seen by a process in Vi* — S. Thus, all processes in Vi* — S eventually make progress 
and finish their FETCH&:lNC() call when scheduled in a round-robin fashion. When this happens 
for a process, that process is stopped, and the scheduling continues with the other processes, until 
all processes are stopped. Finally, we let p run until it has finished its fetch&inc() call. (Since 
no other process q e S is visible, and p has not seen a process from S, p's FETCH&INC() call will 
eventually terminate.) 

5.2 Analysis 

For our analysis we assume that n is a perfect square and let m = y/n,. Later in the analysis, we 
will fix a sequence of coin flips in such a way that each process gets the same coin flips during every 
run of the algorithm, no matter what the scheduling is. Let c denote such a sequence of coin flips. 
(Note that c is not the same as the coin-flip vectors c used in previous sections.) However, for any 
fixed adversary A and a fixed process p, each coin-flip sequence c uniquely determines a random 
history H and we can choose c uniformly at random to determine Xji^p = Xj\^^p(c). 

For the analysis, we will let the adversary Ap do the scheduling in order to try to "fool" process 
p. Since we don't know which adversary is the best for the purpose of a lower bound, we choose 
p £ V at random. To emphasize that p, A and c are random (although there will be a dependence 
between p and A), we now denote X_4^p(c) by the random variable X = X(A,p,c). We prove the 
following: 

Lemma 5.5. For c and p &V chosen uniformly at random, 



Ep,c[X{Ap,p,c)] = 



By averaging over all processes p EV, there exists a process p such that for random c 



Eg [XiAp,p,c]=niV^). 



(4) 



Since 



^{Ap) = max 
qev 



(Ee [X{Ap, q, c)]) > Ea [X{Ap,p, c)] 



27 



Theorem 15.31 follows immediately. Thus, it suffices to prove Lemma 15.51 

In order to prove the lemma, we first consider an arbitrarily fixed coin-flip sequence c, and 
we choose only p ^ V uniformly at random. Let H = H{p) be the random history obtained if 
adversary Ap schedules a run of LoadBalance(), for the fixed coin ffips c. By construction of 
Ap, during H process p's fetch&:inc() call returns. We write L = L{p) for the random variable 
that denotes the return value of that FETCH&iNC()-call. (Note that the value of X can be even 
if L > 0, because in H contention might exceed Kmax-) Since c is fixed, each process gets the same 
results from its coin-flips, independently of the scheduling, and thus independently of the choice of 
p. Hence, the set Vi, < i < m, of processes that choose counter Fi is fixed, too. Moreover, since 
we choose p £ V at random, the counter chosen by p is determined by which set Vi process p is 
taken from. 

In the following, we analyze the expectation of X conditioned under the event that p is in Vj 
for an arbitrary index j. More precisely, we show the following: 

Lemma 5.6. For any fixed c, any < j < m, and for p G Vj chosen uniformly at random, 

E[L] = n{\V,\). 

For the ease of notation, we prove the statement for j = 0. Analogous arguments prove the 
statement for an arbitrary choice of j, and so we lose no generality. 

Let H' = H'{p) be the prefix of H that ends when configuration C = C{p) is reached (i.e., when 
all processes in Vq are stalled after they have executed their first shared memory access), and let 
H" = H"(p) the suffix such that H = H' o H" . Let op^ be the shared memory operation process 
q €z V performs during H' . Let V be the set of processes in Vq, that are visible in configuration C 
and whose first shared memory operation during H' is not a write to a register that was previously 
written by some other process. I.e., the operation op^ of a process g G V is a write to some register 
R, such that no other process q' writes to R during H' . 

Let £ be the event that p is visible in configuration C. Since p is the first process to execute a 
shared memory access, £ occurs if and only if p £ V. Note that if event £ occurs, then the adversary 
acts as described under Case 1, otherwise as in Case 2. In the following we give lower bounds on 
the conditional expectation of L for both cases. 

5.2.1 Analysis of Case 1 

Claim 5.7. Tlie set V is independent ofp, andE[L\ £]> (|V| - l)/2. 

Proof. Recall that in Case 1 we assume that event £ has occurred, and so p £ V. On the other 
hand, process g is in V if and only if opq is a write to some register Rg and no other process writes 
to Rq during H' . Whether a process writes in the first step, and if so, to which register it writes, 
depends only on g's coin-flips, i.e., on c, but not on the choice of p. Thus, V is independent of p. 

Now let Cji = Cji{p) be the (random) configuration of the shared memory at the end of H' . 
(Note that process states are not captured by Cr.) Note that the position in H' of the event where 
a process q £ V executes its shared memory operation op^ on register Rg has no effect on the 
configuration Cr because q is the only process in V writing to Rg. Also, given that p £ V, in H' 
the relative order of steps by processes that are not in V is independent of p. Hence, given that 
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p € V, the register configuration C/j is independent of p. Formally, 

Vg,g'eV: CR{q)=CR{c^)^ 



(5) 



Recall that W is the set of processes in Vq whose first shared-memory step is a write. Clearly, 
a process in W cannot see any other process during H\ because it performs no more steps after 
its write. Hence, given p G V C W, what processes in W observe during H' is independent of p. 
Thus, 

Vg,g'eV: H'{q)r^yvH'{q'). (6) 

Hence, at the end of H' the state of a processor in W is independent of the choice oi p € V. From 
this and ([5]), 

yq,q'eV: C{q) ^y, C{q'). (7) 

In the second part of the scheduling by process Ap, which determines H" , only processes in W 
get scheduled in a round-robin way, ordered by their IDs. Given p G V C W, that scheduling 
is independent of p. Since, given p € V, processes in W cannot distinguish between the possible 
configurations C obtained at the end of H' , they cannot distinguish between the histories H"{p), 
p € V, either. I.e., 

Vg,g'G V: H" {q) ^yv H" {d). 

From this and we get 

^q,q'eV: H{q) H{q'). 

As a consequence, given p £ V, the value returned by the fetch&inc() of process q G V is 
independent of p. Since all processes in V finish their fetch&:inc() during H, but none of them 
starts its fetch&:dec(), the values processes in V receive from their fetch&inc() calls are all 
distinct non-negative integers from a (fixed) set S = {si, . . . , S|v|}- Thus, given that p is distributed 
uniformly over V, the return value of p's FETCH&INC() is distributed uniformly over S. The 
expectation of that value is at least (|V| — l)/2. □ 

5.2.2 Analysis of Case 2 

Claim 5.8. Let H he an arbitrary history of Algorithm [3 (LoADBALANCEj, and let P U {p}, 
p ^ P, be a set of processes that call Fj.FETCH&iNC(), i £ {0, . . . ,m — 1} . Suppose the following 
hold for H: 

(a) the processes in PU {p} all finish their Fj.FETCH&iNC(), 

(b) no process in V calls Fj.FETCH&DEC(), and 

(c) none of the processes in P sees a process in P. 

Then process p's Fj.fetch&incO returns a value of at least \P\. 

Proof. By (a) and (b) all processes in P U {p} receive distinct values from their FETCH&:lNC() 
calls. By (c) and Observation 15. 4[ H and H\P are indistinguishable to the processes in P, and 
so the return value of a fetch&inc() call by a process q G P is at most |P| — 1. Hence, the 
set of FETCH&INC() call return values of processes in P is exactly {0, . . . , |P| — 1}, and so p's 
FETCH&INC() call must return a value of at least \P\. □ 

®With CR{q) we denote the register configuration obtained if the random process p — q. Similarly we write H{q), 
H'{q), H"{q), and C{q). 
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Now let Q be the set of processes in Vq whose first shared-memory access is a read, a LL, or a 
SC (which fails). Note that Q is independent of the random choice of p € T^o- Recall that S is the 
set that contains "p and all processes that see p during K' . Since the fetch&inc() implementation 
is natural, S C Vq. Let S = S{p) = \S\. 

Claim 5.9. If event £ does not occur, then L > \Vq\ — S > \Vo\ — \ Q\- 

Proof. If event £ does not occur, then p is not visible in configuration C. By definition of the 
adversary, during H" only processes in {Vq — S) L) {p} take steps, and finish their FETCH&INC() 
cah but don't start their fetch&dec() cah. 

A process in 5 — {p} does not "leave a trace" during H, i.e., it is never visible: Since it sees p 
its only operation occurs during H' , and that operation is either a read, a LL, or a failed SC. In 
particular, during H, no process can see a process in 5 — {p}. In addition, no process q £ Vq — S 
sees p during H: It cannot see p during H' , or else it would be in 5, and it cannot see p during H" , 
because p remains invisible until q has finished its fetch&inc() call and is stopped. Moreover, 
since the implementation is natural, a process in Vq cannot see a process in Vq. To conclude, no 
process in Vq — S sees any process in 5 U "Pq = Vo — <S- Thus, the conditions of Claim 15.81 are 
satisfied for P = Vq — S, and so L > {Pq — S\ = \'Po\ — S. Because op^ is a read, a LL, or a failed 
SC for every process q £ S, we have S" < |Q| and thus \Vo\ — S > ["Pol ~ |Q|- D 



5.2.3 Putting Things Together 

Proof of Lemma \5.6i W.l.o.g. assume j = 0. Which operation a process in Vo executes in its first 
shared memory step is independent of the choice of p E Vq. Thus, since p is uniformly distributed 
in Vo, we have 

Prob(pGQ) = M. (8) 

Similarly, since by Claim [5?71 |V| is independent of p, and since event £ occurs if and only if p € V, 

Prob(£:) = Prob(p G V) = (9) 

Applying Claim 15.71 and ([9]) we obtain 

E[L| £:].Prob(£:)>^^.^. (10) 

li p £ Q, then no process sees p during H', i.e., S = 0. Moreover, if p G Q, then p is not visible 
in configuration C and thus £ does not occur. Consequently, by Claim [5^ L > \Vo\, and thus 
applying dSD, 

E[L| pGQ].Prob(peQ)>|Po|-^. (H) 

If p Q U V, then £ does not occur, and by Claim [5T9l L > \Vo\ — \ Q\- Hence, applying the union 
bound for (P and dH) 

/ IQI + IVI 

E[L I p0 QUV] •Prob(p0 QU V) > (|Po| - |Q|) • 1- 



l^ol 



^ (|^o|-|Q|-|V|)^ 
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Now let 

Z = max{|V|(|V| - l)/2, \Q\ • \Vo\, {\Vo\ - \Q\ - \V\f}. 
Since either |V| > \Vo\/3 or \Q\ > \Vo\/3 or (|Po| - \Q\ - |V|) > \Vo\/3, we have 

Z>.in{ 'W3)(W/3-l) _ M, (|P„|/3f} = „(|P.r^,^ 
Summing up (fTOl) . (fTT]l . and (fT2]l . we obtain 

E[L]>-^.Z = n(|Po|). 
I' ol 

□ 

We are now ready to prove the main lemma. Up to now, we assumed that the coin flips c are 
fixed. In the following we have to consider a random choice of c S and a random choice of 
p £ V. We use the fact that the number of processes, \Vi\, that choose a counter object i is highly 
concentrated around the expectation, n/m = ^/n. Consequently it is unlikely that there is an index 
i such that \Vi\ is much larger or smaller than -^/n. 

Claim 5.10. Let fi = n/m and 6 > an arbitrarily small constant. For a randomly chosen c, with 
probability 1 — o(l) it is true that 

VO < i < m : {l-5)fj.< \V^\ < (1 + 5)fi. 

Proof. First consider an arbitrary index i £ {0, . . . ,m — 1}, and let Yi = \Vi\. Since each process 
q G V independently chooses an index ig £ {0, . . . ,m — 1}, the distribution of the random variable 
Yi is identical to that of the binomial random variable B{n,l/m). From Chernoff Bounds and 
/i = n/m = ^/n we obtain 

Prob(yi < (1 - S)ii V > (1 + S)y) < 2 ■ e"^(^) = e-^^^\ 
The claim now follows immediately from summing up this probability bound for Yq, . . . , □ 

Proof of Lemma 15.51 Choose c and p uniformly at random and consider the history H = H{Ap,p, c) 
obtained by a scheduling of the adversary Ap. Let F denote the event that 

(1 - 6)m < |Po|, • • • , iVm-il < (1 + S)m. 

Note that whether or not F occurs depends only on c and not on p. 

Let L = L(Ap,p,c) denote the return value of p's fetch&inc() call. By Lemma 15.61 and since 
F is independent of p, 

E[L\ F] = n{m). (13) 

By definition of Ap, if p G Pj, then at any point there is at most one process active that is not 
in Vj. Hence, if F occurs, then the maximum point contention, K, satisfies 

K < max IVA + 1 < (1 + 6)m + 1, 

0<«<m 

and since K is an integer, K < \{1 + 5)m] = Kmax- Hence, given that F occurs, X{Ap,p,c) = L 
and so applying Claim [5?T0] and (fT3l) . 

E [X{Ap,p,c)] > E[X(^p,p,c) I F] •Prob(F) = E[L | F] • (l - o(l)) = ^{m). 

□ 
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Function i?.MRSWwRiTE(u) 



Function iJ.MRSWREAD 



1 seq := seq + 1, seq is a sequence 



1 



code for process rj, i S {1,2} 



number initialized to 1 

2 Rw-ri-'^^i'teiv, seq) 

3 i2„,_.r2.write(t;, seg) 



4 



2 



3 



t;[0],s[0] := i?it,_ri.read 
v[2],s[2] := i2r2-ffi-read 



5 



let j be such that s[j] = max{s[0], s[l], s[2]} 



8 



6 



7 



i?ri-ri-write(t; [j], seg[j]) 
Eri-r2-write(t;[j], seq'[j]) 
return 



Figure 5: Linearizable Implementation of MRSW Registers from SRSW Registers 

A Additional Examples 

When, in a randomized distributed algorithm, atomic operations are replaced by their linearizable 
implementations, the scheduler's power can increase. Even when the power of the scheduler of 
the implemented algorithm is constrained significantly from its power in the atomic model, the 
scheduler can still create a probability distribution of computations that is dramatically different 
from what is attainable in the atomic case. 

The following are various additional examples of this phenomenon. 

Implementing multi-reader /single- writer registers from single-reader/single- writer 
registers. Let ii be a two-reader atomic register initialized to 0, accessed by writer w, and 
readers ri and r2 that are executing the code: 

tt;: i?.MRSWwRlTE (1); c/ := uniform-random{ — 1, 1}; i?.MRSWwRlTE (c/) 
n: E.MRSWread () 
ra: i2.MRSWREAD Q 

Suppose the strong adversary is trying to minimize the value of ri's MRSWread. If R is an 
atomic register, then the adversary's best strategy is to have ri execute its MRSWread either 
before or after both of w's MRSWwrite operations. In either case the expected value of ri's 
MRSWread is 0. 

Now suppose, instead, that R is implemented using the algorithm shown in Figure O This con- 
struction uses 6 single-reader/single-writer registers: Rw-ri, Rw-r2, Rri-ri, Rri-r2, Rr2~ri, Rr2-r2, 
where the subscript x-y denotes a register written by x and read by y. Each is initialized to the 
initial value of R. 

Under this implementation, the adversary's could schedule as follows: First one step of ri: 
-R^-fi.read, which will return 0; then all of w's steps (that is, all of its first MRSWwrite, its flip 
and its second MRSWwrite). If the value of c/ is 1, then the adversary next schedules the rest 
of ri's steps, followed by all ra's steps in its i?. MRSWread. In this case the implementation of 
i?. MRSWread by ri returns 0. If the value of cf is -1, then the adversary next schedules all ra's 
steps in its ii.MRSWREAD followed by the remainder of ri's i?. MRSWread. In this case, ri will 
discover the updated value of R, when it executes i?,.2_ri.read, and the implementation returns -1. 
Hence the expected value returned by ri is -1/2. 
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Function Q.enq{v) 



Function Q.DEQ(f ) 



1 pos := iai/.f etch&inc () 

2 item[pos]. write (v) 



1 while True do 

2 max := tai/.read 

3 for i = . . . max — 1 do 

4 t> := temp['i].f etch&set 

5 if i 7^ _L then return 

6 end 

7 end 



Figure 6: Implementation of the Herlihy-Wing Queue 



Notice that this schedule is available even to the weak adversary. It needs the coin flip value 
only after w has completed all its operations. So reducing the power of the adversary from strong 
to weak does not curtail its power sufficiently to retain the expected behaviour of the algorithm 
when R is an atomic register. 

This algorithm in Figure [5] is the wait-free linearizable implementation of multi-reader /multi- 
writer multivalued atomic registers from single-reader /single- writer multivalued atomic registers 
using unbounded sequence numbers due to Vitanyi and Awerbuch ( |VA86| ) when specialized to 
two readers and one writer. The increased power of the weak adversary in the implementation over 
the power of the strong adversary in the atomic case also extends to the case when there is more 
than one writer. 

Implementing a queue with atomic increment objects. Let Q be a queue object, initially 
empty and accessed by ^i, q2 and p that are executing the code: 

go: Q.ENQ (0) 
gi: Q.ENQ (1) 

p: Q.ENQ (2); cf := uniform-randomjO, 1}; Q-DEQ; Q.DEQ; Q.DEQ 

The adversary's goal is to achieve the following: 

(a) all of p's DEQ operations succeed, 

(b) the DEQ operation that returns 1 precedes the deq operation that returns 2, and 

(c) the return value of p's first DEQ operation equals the result of the flip. 

Even if the adversary is strong, in order to achieve (b), it must schedule gi's enq operation 
before p's flip operation. Hence, by the time the flip occurs, the decision whether or 1 is in front 
of the queue has been made, and cannot be changed by the adversary. Therefore, the probability 
that p's first DEQ operation returns the value of the flip is at most 1/2. 

Herlihy and Wing give a linearizable implementation of a queue using read-modify-write base 
objects that support the operations f etch&set, fetch&inc and read [H W90| . The queue is repre- 
sented by an unbounded array of items with a tail pointer. The implementation is shown in Figure 
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Figure 7: A "bad" scheduling for the queue algorithm. 



If the above algorithm uses this queue implementation, then the weak adversary could schedule 
as shown in Figure [71 Here, the left bar in the drawing of a queue method call denotes that method 
call's first shared memory access, and the right bar its last. I.e., the f etch&inc operations of the 
ENQ operations occur in the order qq, qi, p. And both, qi and p execute their write before the 
flip happens. If the result of the flip is 0, then go writes immediately, before any deq operation 
starts. Therefore, the first deq operation will return 0. If the result of the flip is 1, then go's write 
is delayed until after the first DEQ operation completed. In this case the first deq operation will 
return 1. In either case, the second deq operation will return or 1, and the third will return 2. 
Hence, even the weak adversary can achieve with probability 1 that (a), (b), and (c) are satisfied. 
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